feat(knowledge): add embedding model selection and Cohere reranker by waleedlatif1 · Pull Request #4349 · simstudioai/sim

waleedlatif1 · 2026-04-30T00:49:36Z

Summary

Add embedding model dropdown to KB creation (text-embedding-3-small/large, gemini-embedding-001)
Add per-search Cohere reranker (rerank-v4.0-pro/fast, rerank-v3.5) with BYOK + rotating key support
Wire embedding + rerank cost into billing/usage_log
Backwards compatible: reranker disabled by default; existing KBs default to text-embedding-3-small

Type of Change

New feature

Testing

Tested manually

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

vercel · 2026-04-30T00:49:40Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Apr 30, 2026 8:30am

cursor · 2026-04-30T00:49:44Z

PR Summary

Medium Risk
Touches core knowledge ingestion/search flows, cost accounting, and introduces external Cohere calls; correctness depends on per-KB model consistency, tokenization differences, and new env/key configuration.

Overview
Knowledge bases are no longer hard-coded to text-embedding-3-small: creation now uses a server-configured embedding model (KB_EMBEDDING_MODEL) and embeds/chunking/search paths propagate the KB’s embeddingModel through embedding generation, token counting, stored chunk metadata, and cost calculation (including adding pricing for gemini-embedding-001).

Vector search gains optional Cohere reranking (rerankerEnabled/rerankerModel) that oversamples candidates (capped at 100), reranks results, returns an optional rerankerScore, and folds rerank cost details into the response; multi-KB vector searches are rejected when selected KBs have different embedding models (tag-only searches still work). Cohere API keys are added to env + rotating-key support, v1 and internal search endpoints are kept in sync via new tests, and docs/tool schemas are updated accordingly.

^{Reviewed by Cursor Bugbot for commit 8fd0557. Configure here.}

greptile-apps · 2026-04-30T00:53:00Z

Greptile Summary

This PR adds multi-model embedding support (text-embedding-3-small/large, gemini-embedding-001) and a per-search Cohere reranker to the knowledge base feature. All previously flagged issues (Azure deployment model mismatch, out-of-bounds Cohere index, tag-only search guard, copilot KB creation, Gemini singular key fallback, and reranker cost on empty results) are addressed in the current HEAD.

Confidence Score: 5/5

Safe to merge; all P1 findings from prior review rounds are addressed and only minor P2 style issues remain

No P0 or P1 issues found in the current HEAD. The two inline comments are P2: one about a confusing cost.input field that conflates embedding and reranker costs (billing total is correct), and one about missing isNull(deletedAt) in a chunk-update KB lookup (practical impact is negligible). All substantive bugs flagged in previous review rounds have been fixed.

apps/sim/app/api/knowledge/search/route.ts — cost.input conflation; apps/sim/lib/knowledge/chunks/service.ts — updateChunk deletedAt guard

Important Files Changed

Filename	Overview
apps/sim/app/api/knowledge/search/route.ts	Core search handler extended with Cohere reranker integration and per-KB embedding model routing; cost.input incorrectly absorbs rerankerCost alongside embedding cost
apps/sim/lib/knowledge/reranker.ts	New Cohere reranker client with BYOK support, timeout, retry, document capping at 100, and defensive index filtering
apps/sim/lib/knowledge/embeddings.ts	Refactored to support multiple embedding providers (OpenAI, Azure OpenAI, Gemini) with per-model resolution, BYOK, and L2-normalization for Gemini
apps/sim/lib/knowledge/embedding-models.ts	New registry for supported embedding models with provider, pricing, and tokenizer metadata
apps/sim/lib/knowledge/chunks/service.ts	createChunk and updateChunk updated to resolve KB embedding model at runtime; updateChunk's KB lookup missing isNull(deletedAt) guard unlike createChunk
apps/sim/providers/models.ts	Added gemini-embedding-001 pricing and new RERANK_MODEL_PRICING table with getRerankModelPricing helper
apps/sim/tools/knowledge/search.ts	Knowledge search tool extended with rerankerEnabled/rerankerModel parameters and updated cost destructuring to include reranker fields

Sequence Diagram

sequenceDiagram
    participant Client
    participant SearchRoute as SearchRoute
    participant EmbeddingAPI as EmbeddingProvider
    participant VectorDB as pgvector
    participant CohereAPI as CohereRerank

    Client->>SearchRoute: POST query + knowledgeBaseIds + rerankerEnabled
    SearchRoute->>SearchRoute: checkKnowledgeBaseAccess resolves embeddingModel per KB
    SearchRoute->>SearchRoute: reject if mixed embeddingModels and hasQuery
    SearchRoute->>EmbeddingAPI: generateSearchEmbedding using KB embeddingModel
    EmbeddingAPI-->>SearchRoute: queryVector 1536-dim
    SearchRoute->>VectorDB: vector search candidateTopK = min(100, topK x 4)
    VectorDB-->>SearchRoute: candidates
    alt rerankerEnabled and hasQuery and results not empty
        SearchRoute->>CohereAPI: rerank query + documents up to 100
        CohereAPI-->>SearchRoute: ranked index + relevance_score
        SearchRoute->>SearchRoute: filter invalid indices and map to SearchResult
        SearchRoute->>SearchRoute: set rerankBilled = true and compute rerankerCost
    end
    SearchRoute->>SearchRoute: compute embedding cost via calculateCost
    SearchRoute-->>Client: results with optional rerankerScore and cost breakdown

_{Reviews (12): Last reviewed commit: "fix(knowledge): prefer singular Cohere k..." | Re-trigger Greptile}

…docs literal

waleedlatif1 · 2026-04-30T01:04:58Z

Addressed both P1 findings from Greptile in 5d1446f:

1. MDX docs literal (line 51) — replaced ${SUPPORTED_RERANKER_MODELS.join(', ')} with the actual model names so readers see rerank-v4.0-pro, rerank-v4.0-fast, rerank-v3.5.

2. Reranker billing gap on 0-result responses — separated rerankBilled (Cohere API was successfully called and we owe the search unit) from rerankApplied (result ordering was actually replaced). Cost block in search/route.ts now keys on rerankBilled, so a successful Cohere call that returns 0 results still records the search unit in the usage log instead of being silently absorbed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

waleedlatif1 · 2026-04-30T01:56:42Z

@greptile

waleedlatif1 · 2026-04-30T01:56:48Z

@cursor review

…g model Greptile P1: when AZURE_OPENAI_* was set, every OpenAI embedding model was routed to the single KB_OPENAI_MODEL_NAME deployment. A KB created with text-embedding-3-large would be embedded by whatever model that deployment serves while billing tracked 3-large pricing — and chunks ingested via Azure versus queried via real OpenAI would land in mismatched vector spaces. Now require AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_(SMALL|LARGE) per model. Falls back to KB_OPENAI_MODEL_NAME only for text-embedding-3-small (legacy). If no deployment is configured for the chosen model, route to direct OpenAI instead of silently routing to the wrong deployment. Also fix type predicate in search/route.ts to use KnowledgeBaseAccessResult so the build passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Cursor bugbot found that resolveCohereKey discarded BYOK status, so the search route always added platform rerankerCost even when the workspace supplied its own Cohere key. Now resolveCohereKey returns { apiKey, isBYOK } and rerank() returns { results, isBYOK }. The search route checks rerankIsBYOK before adding rerankerCost or emitting the rerankerCost/rerankerSearchUnits fields, mirroring how generateEmbeddings handles BYOK billing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

waleedlatif1 · 2026-04-30T02:17:10Z

@greptile

waleedlatif1 · 2026-04-30T02:17:14Z

@cursor review

…dead var Cursor bugbot: - Token estimation was hardcoded to 'openai' for every embedding model. For gemini-embedding-001 the cost was computed against an OpenAI-tokenized count, producing wrong input.tokens.prompt and (slightly) wrong cost. Now derive the tokenizer provider from the embedding model's provider. - rerankApplied was set but never read. Removed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

waleedlatif1 · 2026-04-30T02:47:01Z

@cursor review

waleedlatif1 · 2026-04-30T02:47:07Z

@greptile

Cursor bugbot: createChunk and updateChunk hardcoded the 'openai' tokenizer when computing the stored tokenCount. For KBs using gemini-embedding-001 the count was estimated with the wrong heuristic, leading to inaccurate stored counts (and any billing derived from them). Now derive the tokenizer from the KB's embedding model provider, matching the search route. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… routes - Use accessCheck.knowledgeBase.embeddingModel directly in chunks response - Narrow access-check predicate to KnowledgeBaseAccessResult in v1 search - Move inaccessible-KB 404 check before query embedding promise creation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

waleedlatif1 · 2026-04-30T06:12:08Z

@greptile

waleedlatif1 · 2026-04-30T06:12:11Z

@cursor review

URLs end up in server access logs, proxy logs, and APM tools, so embedding the key as a query param risks accidental exposure. Google explicitly recommends the header form for the Gemini REST API. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

waleedlatif1 · 2026-04-30T06:23:20Z

@greptile

waleedlatif1 · 2026-04-30T06:23:24Z

@cursor review

Restore the prior fallback so existing Azure deployments — which conventionally name the deployment after the model — continue to route through Azure when KB_OPENAI_MODEL_NAME is unset. Before this fix, those deployments silently fell through to direct OpenAI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

waleedlatif1 · 2026-04-30T06:57:28Z

@greptile

waleedlatif1 · 2026-04-30T06:57:34Z

@cursor review

…API_KEY fallback - Gemini's batchEmbedContents API rejects requests with more than 100 items. The token-based batcher could pack hundreds of short chunks into a single request, causing 400s. Add maxItemsPerRequest on ResolvedProvider and split token batches further when set. - Mirror resolveOpenAIKey by accepting GEMINI_API_KEY (singular) as a fallback before requiring the rotating GEMINI_API_KEY_1/2/3 keys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

waleedlatif1 · 2026-04-30T08:13:41Z

@greptile

waleedlatif1 · 2026-04-30T08:13:45Z

@cursor review

Match resolveOpenAIKey/resolveGeminiKey order: check the singular COHERE_API_KEY before falling back to rotating keys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

waleedlatif1 · 2026-04-30T08:32:27Z

@greptile

waleedlatif1 · 2026-04-30T08:32:31Z

@cursor review

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8fd0557. Configure here.}

…4349) * feat(knowledge): add embedding model selection and Cohere reranker * fix(knowledge): split reranker model constants into client-safe module * fix(knowledge): bill rerank on every successful API call and fix MDX docs literal * test(knowledge): align embedding tests with provider abstraction changes Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): require explicit Azure deployment per OpenAI embedding model Greptile P1: when AZURE_OPENAI_* was set, every OpenAI embedding model was routed to the single KB_OPENAI_MODEL_NAME deployment. A KB created with text-embedding-3-large would be embedded by whatever model that deployment serves while billing tracked 3-large pricing — and chunks ingested via Azure versus queried via real OpenAI would land in mismatched vector spaces. Now require AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_(SMALL|LARGE) per model. Falls back to KB_OPENAI_MODEL_NAME only for text-embedding-3-small (legacy). If no deployment is configured for the chosen model, route to direct OpenAI instead of silently routing to the wrong deployment. Also fix type predicate in search/route.ts to use KnowledgeBaseAccessResult so the build passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): skip platform reranker billing for BYOK Cohere keys Cursor bugbot found that resolveCohereKey discarded BYOK status, so the search route always added platform rerankerCost even when the workspace supplied its own Cohere key. Now resolveCohereKey returns { apiKey, isBYOK } and rerank() returns { results, isBYOK }. The search route checks rerankIsBYOK before adding rerankerCost or emitting the rerankerCost/rerankerSearchUnits fields, mirroring how generateEmbeddings handles BYOK billing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): match search tokenizer to embedding provider; remove dead var Cursor bugbot: - Token estimation was hardcoded to 'openai' for every embedding model. For gemini-embedding-001 the cost was computed against an OpenAI-tokenized count, producing wrong input.tokens.prompt and (slightly) wrong cost. Now derive the tokenizer provider from the embedding model's provider. - rerankApplied was set but never read. Removed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): match chunk tokenizer to KB embedding provider Cursor bugbot: createChunk and updateChunk hardcoded the 'openai' tokenizer when computing the stored tokenCount. For KBs using gemini-embedding-001 the count was estimated with the wrong heuristic, leading to inaccurate stored counts (and any billing derived from them). Now derive the tokenizer from the KB's embedding model provider, matching the search route. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(knowledge): centralize tokenizer mapping on EmbeddingModelInfo Add tokenizerProvider directly to EmbeddingModelInfo so callers read it from the registry instead of reimplementing the gemini→google / openai→openai map at each call site. Removes the local helper in chunks/service.ts and the inline ternary in search/route.ts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(knowledge): lock embedding model to KB_EMBEDDING_MODEL env var Remove the user-facing model picker from the KB create modal and the embeddingModel field from the create/update API schemas. The active model is now selected server-side via KB_EMBEDDING_MODEL, which collapses Azure routing to a single deployment (KB_OPENAI_MODEL_NAME) and drops the per-model AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_* env vars and SUPPORTED_EMBEDDING_MODEL_IDS / UI-only label+description registry fields. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use provider tokenizer for chunks and bound rerank indices - documents/service.ts: replace ceil(len/4) heuristic with estimateTokenCount using the embedding model's tokenizerProvider so token counts match billing - reranker.ts: filter Cohere rerank results to valid indices before mapping to defend against malformed responses - utils.test.ts: add embeddingModel to kb fixture so getEmbeddingModelInfo resolves Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use .count from estimateTokenCount return value estimateTokenCount returns a TokenEstimate object, not a number — access .count so the integer token count is stored instead of an object. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): only enforce single embedding model when query is present Tag-only searches don't generate a query embedding, so two KBs with different embedding models can be filtered together. Gate the guard on hasQuery so cross-model tag-only queries no longer 400. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use getConfiguredEmbeddingModel in copilot KB creation Copilot-created KBs were hardcoded to text-embedding-3-small, ignoring KB_EMBEDDING_MODEL. This caused cross-KB searches mixing copilot- and API-created KBs to hit the embedding-model-mismatch guard. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): make EMBEDDING_DIMENSIONS a literal type CreateKnowledgeBaseData.embeddingDimension is typed as the literal 1536, so EMBEDDING_DIMENSIONS needs `as const` to satisfy it after the copilot path switched to passing the constant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): use per-KB embedding model in v1 search route The v1 search endpoint was passing undefined to generateSearchEmbedding, which silently fell back to text-embedding-3-small. KBs created while KB_EMBEDDING_MODEL=gemini-embedding-001 (or any non-default) would have their queries embedded with the wrong model. Now resolves the model from the KB rows like the internal route, with the same multi-model guard. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(knowledge): polish embedding/reranker implementation - Drop unused supportsCustomDimensions from EmbeddingModelInfo (every registered model supports it; OpenAI/Azure paths now always send dimensions: 1536). - Type SUPPORTED_EMBEDDING_MODELS as Partial<Record<...>> so index lookups surface as possibly-undefined in the type system instead of relying on runtime null checks alone. - Require AZURE_OPENAI_API_VERSION in the Azure routing gate. Missing api-version no longer slips through as ?api-version=undefined; it now falls back to direct OpenAI. - Use the embedding provider's tokenizer (estimateTokenCount) for the Gemini fallback token estimate instead of len/4, so billing matches the model's tokenization. - Drop unreachable 'text-embedding-3-small' fallback in the manual chunk upload route — accessCheck.knowledgeBase is non-null after the access guard. - docs-chunker now reads getConfiguredEmbeddingModel() so Sim's docs ingestion respects KB_EMBEDDING_MODEL like the user-facing paths. - Add v1 search route test covering per-KB model resolution and the cross-KB mixed-model rejection. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): resolve type errors and unhandled rejection in search routes - Use accessCheck.knowledgeBase.embeddingModel directly in chunks response - Narrow access-check predicate to KnowledgeBaseAccessResult in v1 search - Move inaccessible-KB 404 check before query embedding promise creation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): pass Gemini API key via x-goog-api-key header URLs end up in server access logs, proxy logs, and APM tools, so embedding the key as a query param risks accidental exposure. Google explicitly recommends the header form for the Gemini REST API. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): default Azure deployment name to embedding model name Restore the prior fallback so existing Azure deployments — which conventionally name the deployment after the model — continue to route through Azure when KB_OPENAI_MODEL_NAME is unset. Before this fix, those deployments silently fell through to direct OpenAI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): cap Gemini batches at 100 items, add singular GEMINI_API_KEY fallback - Gemini's batchEmbedContents API rejects requests with more than 100 items. The token-based batcher could pack hundreds of short chunks into a single request, causing 400s. Add maxItemsPerRequest on ResolvedProvider and split token batches further when set. - Mirror resolveOpenAIKey by accepting GEMINI_API_KEY (singular) as a fallback before requiring the rotating GEMINI_API_KEY_1/2/3 keys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(knowledge): prefer singular Cohere key before rotation Match resolveOpenAIKey/resolveGeminiKey order: check the singular COHERE_API_KEY before falling back to rotating keys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

feat(knowledge): add embedding model selection and Cohere reranker

9614a28

vercel Bot deployed to Preview April 30, 2026 00:52 View deployment

greptile-apps Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread apps/docs/content/docs/en/tools/knowledge.mdx Outdated

fix(knowledge): split reranker model constants into client-safe module

c56b7a4

vercel Bot temporarily deployed to Preview April 30, 2026 00:53 Inactive

fix(knowledge): bill rerank on every successful API call and fix MDX …

5d1446f

…docs literal

vercel Bot deployed to Preview April 30, 2026 01:08 View deployment

test(knowledge): align embedding tests with provider abstraction changes

553021a

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel Bot temporarily deployed to Preview April 30, 2026 01:43 Inactive

greptile-apps Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread apps/sim/lib/knowledge/embeddings.ts